A Linear Solution for Haplotype Perfect Phylogeny
نویسنده
چکیده
YK-extended-abstract-FINAL 2 Evidence from investigations of genetic differences among humans shows that genetic diseases are sometimes the result of genetic mutations that occur in more than one percent of a population (variations). The most common of these variations are single nucleotide polymorphisms (SNPs). A complete map of all SNPs in the human genome will be extremely valuable for studying specific haplotypes with specific genetic diseases. Distinguishing the information contained in both haplotypes when analyzing chromosome sequences poses several new computational issues which collectively form a new merging topic of Computational Biology known as Haplotyping. The recent discovery 6 3 8 4 that genomic DNA can be partitioned into long blocks where genetic recombination has been rare had led to more meaningful mathematical research, which has compounded computational problems. We are interested in the algorithmic implications based on observing the long blocks of DNA that have not undergone recombination to infer haplo-types in populations. This assumption justifies a model of haplotype evolution-the haplotypes in a population are assumed to evolve along a coalescent, based on the standard population-genetic assumption of infinite sites, which as a rooted tree is a perfect phylogeny. The Perfect Phylogeny Haplotyping (PPH) problem, introduced by Gusfield 5 , is " given n (number of) genotypes, does a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set exist, and such that this set can be derived on a perfect phylogeny? ". Gusfield established a O(nm 2)-time algorithm 2 that determines whether there is a PPH solution for input genotypes, and produces a linear-space data structure to represent all the solutions. He also conjectured that it is possible to solve the PPH problem in linear time 2. In this paper, we will solve the conjecture of Gusfield and introduce a linear-time algorithm for PPH problem. We prove that a haplotype matrix that has a perfect phylogeny induces isomorphic posets by either taking rows or columns as the vertex set. Moreover, a genotype poset induced from a genotype matrix is a super poset of the haplotype poset induced from the feasible expansion of the genotype poset. After studying the hasse diagrams of genotype posets, we develop a new graph model and design a linear-time (O(nm)) algorithm to solve the Perfect Phylogeny Haplotyping problem. We believe our model can be further applied to solve other PPH-related problems, such as Pure …
منابع مشابه
Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملComputational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs
The perfect phylogeny model for haplotype evolution has been successfully applied to haplotype resolution from genotype data. In this study we explore the application of the perfect phylogeny model to other problems in the design and analysis of genetic studies. We consider a novel type of data, xor-genotypes, which distinguish heterozygote from homozygote sites but do not identify the homozygo...
متن کاملAlgorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event
The haplotype inference (HI) problem is the problem of inferring 2n haplotype pairs from n observed genotype vectors. This is a key problem that arises in studying genetic variation in populations, for example in the ongoing HapMap project [5]. In order to have a hope of finding the haplotypes that actually generated the observed genotypes, we must use some (implicit or explicit) genetic model ...
متن کاملHaplotyping with missing data via perfect path phylogenies
Computational methods for inferring haplotype information from genotype data are used in studying the association between genomic variation and medical condition. Recently, Gusfield proposed a haplotype inference method that is based on perfect phylogeny principles. A fundamental problem arises when one tries to apply this approach in the presence of missing genotype data, which is common in pr...
متن کاملPerfect phylogeny haplotyper: haplotype inferral using a tree model
SUMMARY We have developed an efficient program, the Perfect Phylogeny Haplotyper (PPH) that takes in unphased population genotype data, and determines if that data can be explained by haplotype pairs that could have evolved on a perfect phylogeny.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004